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Description 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

[0001] The present invention relates generally to 
methods and apparatuses for encapsulating informa- 
tion, identifying the information, representing the infor- 
mation, and facilitating the transfer of the information be- 
tween users, between remote storage and an originat- 
ing user, or between remote storages using computers 
and digital telecommunication networks. 

2. Description of the Related Art 

[0002] Digital information must often be identified to 
be in a particular state, denoted by the status of an asset 
(such as a file) as of some event or time. Such assets 
include traditional data files, multimedia files and frag- 
ments, records from structured databases, or any other 
string of digital information used wholly or in part by 
some application or device. Digital information is highly 
subject to change and few methods are available to in- 
spect the contents of the digital information to reliably 
recognize whether it has been changed since some pri- 
or time or event Normal attempts to improve or perfect 
the content, inadvertent commands or actions which 
change the content, or tampering by others unknown to 
primary owners of the digital information are difficult to 
detect. As such, computers users have no convenient 
mechanism for establishing the origin or integrity of par- 
ticular content versions. 

[0003] Another problematic attribute of digital infor- 
mation (such as a computer file) is that copies may exist 
which are identical in content but differ in the meta data 
that the computer system uses to relate to the digital 
information. Such meta data includes the date/time re- 
corded for the creation or last modification of the file, the 
file name associated with the file and other information. 
The meta data may imply that otherwise identical copies 
of digital information are different when in fact they are 
not. Such confusion makes it difficult to avoid unneces- 
sary duplication of content on a single computer or on 
a collection of computers on a network. This confusion 
may also result in the unnecessary coping of such data 
files across networks or from other media when, in fact, 
a particular data file needed is readily available on a 
computer system or network already. 
[0004] The existence of a particular file under multiple 
names has a counterpart problem. Data on computer 
systems can generally only be accessed through iden- 
tifiers or location mechanisms which to a greater or less- 
er extent include information about the location of the 
file in the storage of the computer. That means that a 
user accesses the data through stored or remembered 
names which include elements which are readily 
changed by others. For example, files within a sub-di- 



rectory are at risk if someone changes the sub-directory 
name. If changed, the path to a file becomes invalid, and 
all of the stored or remembered names of files become 
invalid as well. This fragile approach to location of data 

5 by location leads to many kinds of problems for users 
and administrators of computer systems, particularly 
those working with networked systems. 
[0005] Finally, there is no convenient way for compu- 
ter users to identify collections of specific versions of 

10 digital files. No robust mechanism exists for computers 
or their users to refer to collections of specific copies or 
versions of digital files without creating a new entity 
which incorporates copies of the files into a new form. 
Many mechanisms have been created to combine such 

15 copies into what are commonly called archive files. Ex- 
amples of archive utilities include the "tar" archiving fa- 
cility common on UNIX systems and the various "zip" 
programs on personal computers. Such solutions create 
additional copies which are often proliferated to many 

20 systems. The difficulty of such solutions is that often ex- 
act digital copies of many of the files in an archive are 
already present on the systems to which they are cop- 
ied. In fact, on many computer systems there are many 
copies of digital files whose contents are exactly the 

25 same. This duplication of identical content is difficult to 
avoid using existing techniques. 
[0006] The result of these problems is that duplicate 
copies of digital files are frequently stored on computer 
storage devices (at expense to the owner of the system) 

30 or transferred on media or telecommunications devices 
(at further expense to the system owner and the tele- 
communications provider). This duplication strains lim- 
ited resources and causes needless confusion on local 
private networks (local area networks, for example) and 

35 on collections of systems connected by digital telecom- 
munication networks. One problem with extra copies is 
that one might think they are different when they are in 
fact the same (and copies are needlessly stored), or 
when they are different, one might think they are the 

40 same because of the same file name. 

[0007] The inability of systems to reliably distinguish 
different versions of files with the same identifier or to 
recognize identical files with different identifiers wastes 
network resources and creates confusion when files are 

45 transferred between users of a network. Often, it is es- 
sential that users know that they are working on the 
same document or know that they are working with the 
same version of an application. For example, when an 
electronic mail (e-mail) message is sent from one user 

50 to another, an attached computer file containing an ap- 
plication or a document is often sent as well. Files may 
also need to be transferred so that applications can be 
distributed. Sending an e-mail message with an at- 
tached file or using a point-to-point scheme in a network 

55 to distribute files can be inefficient in terms of the amount 
of network bandwidth that is used. For example, when 
a user attaches a number of files to an e-mail message, 
it may be that a copy of one or more of those files is 



3 

already stored on the intended recipient's hard drive. In 
such a case, the network bandwidth used to transfer the 
attached files is wasted. If the files could be reliably iden- 
tified and the files' contents could be reliably verified, 
then the recipient could simply retrieve the files from his 5 
own hard drive or from a local network server and verify 
that they are indeed the files that are attached to the e- 
mail message. 

[0008] A similar problem occurs in managing comput- 
ers on a network and making sure that the computers 
are configured in a certain way with certain applications. 
For example, when a small change is made to an oper- 
ating system or to hardware that is available to the net- 
work, certain files may need to be transferred to each 
computer on the network. A given computer may have 
most or almost all of the necessary files loaded and only 
a few files may need to be provided or updated from a 
central source. In many cases, the requesting computer 
and the source computer are far from one another and 
are connected by a data link that operates at a slower 
speed than a local data link would operate. Currently, it 
is necessary to keep track of both the files that are on 
the requesting computer and the files that need to be 
added so that proper updates can be made. It would be 
useful if there existed a way to specify all of the files that 
are to be transferred and to encapsulate that specifica- 
tion in such a way that would allow the files to be re- 
trieved from the most convenient place (locally, if pos- 
sible). It would further be useful if such a method would 
allow the files to be reliably verified as the correct files. 
[0009] When files are distributed on a local area net- 
work (LAN) from a source outside the LAN, the problem 
can be even more serious. For example, when a com- 
pany such as Netscape Communications Corporation 
provides a new web browser on their web site, hundreds 
or even thousands of employees at a single company 
attempt to download the browser from Netscape's web 
site. This is perhaps the most inefficient way for the re- 
quired software to be distributed within a company. It 
would be more efficient, for example, if one coworker 
could reliably retrieve needed files from another. If the 
necessary files could be somehow uniquely identified in 
a manner that would allow the actual data in the files to 
be obtained from the most convenient source, then all 
of the outside bandwidth used up when all the users 
download files from an outside source could be saved. 
In addition, users would obtain access to the files much 
faster as well. 

[0010] The problem of specifying a set of files to be 
stored on various computers and ensuring that the cor- 
rect files are stored on the computers in a network is 
described in United States Patent No. 5,581 ,764 issued 
to Fitzgerald et al. Fitzgerald teaches a method of dis- 
tributing resources over a computer network. The meth- 
od involves generating Already Have and Should Have 
lists for each of the computers on the network and com- 
paring a Last Updated Date/Time (LUDT) field in the 
Should Have list to a Last Synchronized Date/Time 
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(LSDT) in the Already Have list. The differences be- 
tween Should Have lists and Already Have lists for indi- 
vidual computers are used to determine which items 
must be compared to update individual desktops. This 
mechanism is dependent on the integrity of system 
clocks and date settings which are unreliable due to ac- 
cidental or malicious entry of false settings. Further- 
more, the mechanism fails in principle when dealing with 
the identification of identical files from different systems. 
An alternative to the Fitzgerald method that would not 
require detailed comparisons of update and synchroni- 
zation times yet would still allow files to be reliably spec- 
ified and would allow needed files to be reliably identified 
would be useful. 

[0011] United States Patent No. 5,710,922 issued to 
Alley et al. describes a method for synchronizing and 
archiving information between computer systems. The 
records are identified with a unique identification indicia 
and an indicia that indicates the last time that the record 
was altered. Using the time of the last synchronization 
information, each of the selected records that was add- 
ed to or deleted from one of the computer systems since 
the last synchronization is identified and added to or de- 
leted from the computer system. Certain techniques and 
operations can falsely indicate changes to records 
which have not, in fact, changed. Furthermore, identical 
copies of digital files on different systems are not readily 
recognized as the same because the mechanism in Al- 
ley provides no mechanism to do so. Again, it would be 
useful if a method for synchronizing file systems could 
be developed that would not require or depend upon 
analysis of update and synchronization times. 
[0012] In general, there is a need for a more reliable, 
flexible and verifiable way of specifying states of known 
data assets (such as computer files) and of providing 
access to those unique data assets, particularly over 
networks. Currently, network sites that are sources of 
data may be mirrored and various load-balancing 
schemes have been devised for distributing load among 
servers that provide data. However, no truly distributed 
system has been devised for sharing and providing ac- 
cess to data whereby data may be reliably and automat- 
ically retrieved from any place where it may be found on 
a network, instead of from specified locations which are 
designed to store and provide access to data. 
[0013] In view of the foregoing, there is a need for 
methods and apparatuses that reliably and verifiably 
transfer files while allowing the site that is receiving the 
files to obtain the files from the most convenient source. 
Further, it is desirable for such techniques to obtain files 
in an efficient manner, to obtain the files locally if possi- 
ble, and to verify that the content of an obtained file is 
the same as the content of the file that is intended to be 
transferred. There is also a need for methods and ap- 
paratuses that minimize the data stored or transferred 
within a system or network. It would be desirable for 
such techniques to provide a reliable mechanism for 
identifying, locating, and accessing data by its contents 
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rather than by exclusively using the meta data tradition- 
ally stored on computer systems 

SUMMARY OF THE INVENTION 

5 

[0014] Accordingly, a system and method are dis- 
closed for representing digital information in an electron- 
ic paperclip, or "e-CLIP" (tm). An e-CLIP is a reproduce 
ible, reliably unique identifier for a collection of digital 
information, derived from the content of the digital infor- to 
mation itself. In one embodiment, an e-CLIP is an alpha- 
numeric reference. An e-CLIP may represent a file, a 
group of files, a group of file identifiers, or other collec- 
tions of data or database information. Such other col- 
lections of data or database information might include 15 
selected database records from a relational, hierarchic, 
network or other format database, selected frames or 
clips from digital audio or video streams, messages from 
streams of message records or files, log entries from 
audit or status logs of systems, subsystems and/or ap- 20 
plications or any other digital assets, the status of which 
at some instance in time is of unique importance in some 
context. The original form or context of each digital asset 
is irrelevant so long as applications provide each such 
asset uniquely to the mechanism or system embodying 25 
this invention. It operates on each unique asset and as- 
sociated meta-data as described to produce a unique 
and useful identifier which enables creation of persistent 
storage of the related assets for future reproduction of 
the originals. 30 
[0015] A cryptographic hash function is used to com- 
pute an identifier for the data being represented. Each 
binary asset is treated as a potentially unique binary se- 
quence. That is to say that any binary entity has a series 
of binary digits which, in sequence, follow a potentially 35 
unique pattern of finite length. Thus, a binary asset at 
an instant in time is a binary sequence which may or 
may not be unique. The use of a cryptographic hash 
function establishes a digital fingerprint or signature that 
virtually uniquely identifies the binary sequence. The *o 
cryptographic hash binary sequence identifier is also re- 
ferred to as a content-addressable or content-based 
name for the data. When a group of files or other digital 
assets is represented, an identifier is generated for each 
of the files using a cryptographic hash function and 45 
placed in a descriptor file. The descriptor file also in- 
cludes meta data such as arbitrary directory structure 
(including relational or hierarchical relationships) infor- 
mation as well as file, record, or other asset meta data 
such as file, record, or asset name, size, date and time so 
stamps and other descriptive data or attributes. In addi- 
tion, the descriptor file includes context information 
about the creation of the collection (time and date of cre- 
ation, user ID of the creating user, etc.). A cryptographic 
hash descriptor file identifier (or descriptor file hash) is 55 
then computed for the descriptor file. 
[001 6] An e-CLIP includes the descriptor file hash and 
may also include a file locator such as a file name or 



URL that gives a source where the descriptor file may 
be obtained if it is not found locally in a convenient stor- 
age location. The binary sequence hashes and descrip- 
tor file hashes (a special case of binary sequence hash) 
are provably unique identifiers of the relevant binary se- 
quences. As such, they form a foundation for the storage 
and retrieval of those sequences as files, database 
records, or other digital entities using the hashes as as- 
set identifiers (keys, locators or other mechanism). Such 
an approach can be said to provide " content address- 
able" storage as the hash is derived from the binary se- 
quence itself, the digital content. 
[0017] In one aspect, the present invention is advan- 
tageous in that meta data associated with each file/ 
record/asset, other data associated with each or all of 
those assets, and context data about the collection is 
also included in the descriptor file. Thus, when the de- 
scriptor file is obtained, the recipient also receives im- 
portant meta data about each asset and context infor- 
mation about the collection. The meta data may be used 
to further verify a file/record/asset, to indicate owner- 
ship, to show modification dates, or to provide other 
needed information about each file. In addition, an em- 
bodiment of the present invention is advantageous 
when file directory structure is also included with the file 
list. Having such file directory structure is helpful in de- 
termining how to organize files amongst their respective 
folders. For example, after data is lost on a particular 
computer, the file list can be used to not only identify lost 
files, but also to reorganize the files into the appropriate 
directory structure. Similarly, meta data about database 
records cataloged in a descriptor file can be used to 
identify tables or files to which those records pertain. 
Such parallels can be drawn with other forms of digital 
asset like audio or video clips, etc. 
[001 8] An e-CLIP functions as a "key to a box of keys" 
where the box of keys is the descriptor file and the keys 
are the binary sequence identifiers (or binary sequence 
hashes). The use of the term "key" has no cryptographic 
or other meaning in the context of this invention. An 
e-CLIP is useful for identifying groups of files that have 
been backed up, are being transferred, etc. At some 
point, a user (or system) may wish to access the files 
starting with nothing more than the e-CLIP. 
[0019] Once the descriptor file (or box of keys) is 
found using the e-CLIP, each of the files/records/assets 
corresponding to the binary sequence identifiers in the 
descriptor file may likewise be found using their respec- 
tive unique binary sequence identifiers. When a partic- 
ular binary sequence is obtained from a source, the 
cryptographic hash function is used to recompute the 
binary sequence identifier to verify that the asset ob- 
tained is the correct asset that was intended to be ac- 
cessed. There is no restriction on the data, meta data 
or file system structure that can be stored and refer- 
enced by an e-CLIP. 

[0020] It should be appreciated that the present inven- 
tion can be implemented in numerous ways, including 
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as a process, an apparatus, a system, a device, a meth- 
od, or using a computer readable medium. Several in- 
ventive embodiments of the present invention are de- 
scribed below. 

[0021] In one embodiment, a system and method are 
disclosed for representing a plurality of assets (files, 
records, or other digital assets). The method includes 
selecting the plurality of assets (binary sequences) to 
be transferred. A plurality of cryptographic hash binary 
sequence identifiers are generated for the plurality of 
assets. Each of the plurality of cryptographic hash asset 
identifiers is computed from the contents of a particular 
asset. A descriptor file is generated that includes the plu- 
rality of cryptographic hash binary sequence identifiers 
computed from the plurality of assets to be transferred. 
A cryptographic hash descriptor file identifier is gener- 
ated that is computed from the descriptor file. The com- 
puted cryptographic hash descriptor file identifier may 
be included in another list of identifiers, and so on, so 
that complex structures can be reduced and represent- 
ed in extremely compact form. 
[0022] In another embodiment, a method of identify- 
ing an asset is disclosed. The method includes selecting 
an asset to be identified. A cryptographic hash asset 
identifier is obtained for the selected asset. A copy of 
the asset is obtained and the integrity of the copy of the 
asset is verified by regenerating the cryptographic hash 
file identifier from the copy of the asset and comparing 
to the cryptographic hash asset identifier of the asset 
being identified. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0023] The present invention will be readily under- 
stood by the following detailed description in conjunction 
with the accompanying drawings in which: 

FIG. 1 is a flowchart illustrating a process for creat- 
ing a cryptographic hash descriptor file identifier of 
a descriptor file, including file meta data for the as- 
sets in the list. 

FIG. 2 is a diagram illustrating the structure of a de- 
scriptor file. 

FIG. 3 is a flowchart illustrating a process for using 
an e-CLIP to find both a descriptor file and the as- 
sets specified in the descriptor file. 

FIG. 4 is a flowchart illustrating a process running 
on an importer used to receive requested assets 
and to verify binary sequence identifiers as is spec- 
ified in step 316 of FIG. 3. 

FIG. 5 is a flowchart illustrating one embodiment of 
step 402 showing how the importer checks multi- 
cast transmissions to reconstitute assets that are 
received in portions. 



FIG. 6A is a block diagram illustrating the structure 
of an asset request generated by an importer as de- 
scribed above in step 402. 

5 FIG. 6B is a block diagram illustrating the structure 
of a data packet that delivers file data to a requester 
in response to a asset request. 

FIG. 7 is a block diagram illustrating one such 
10 chained set of importers. 

FIGS. 8 and 9 illustrate a computer system 900 suit- 
able for implementing embodiments of the present 
invention. 

15 

DETAILED DESCRIPTION OF THE INVENTION 

[0024] Reference will now be made in detail to the pre- 
ferred embodiment of the invention. An example of the 
20 preferred embodiment is illustrated in the accompany- 
ing drawings. While the invention will be described in 
conjunction with the preferred embodiment, it will be un- 
derstood that it is not intended to limit the invention to 
one preferred embodiment. On the contrary, it is intend- 
25 ed to cover alternatives, modifications, and equivalents 
as may be included within the spirit and scope of the 
invention as defined by the appended claims. For ex- 
ample, for ease of understanding, many of the figures 
illustrate use of the invention with traditional computer 
30 files. As described herein, however, the present inven- 
tion is suitable for use with any digital asset or binary 
sequence. 

[0025] In the following description, numerous specific 
details are set forth in order to provide a thorough un- 
35 derstanding of the present invention. The present inven- 
tion may be practiced without some or all of these spe- 
cific details. In other instances, well known process op- 
erations have not been described in detail in order not 
to unnecessarily obscure the present invention. 

40 

OVERVIEW 

[0026] The present invention provides a technique 
and mechanism by which a reliably unique binary se- 

<5 quence identifier (also referred to as a binary sequence 
hash or a cryptographic hash binary sequence identifier) 
is generated for each binary sequence in a user-defined 
collection of binary sequences (digital assets). These bi- 
nary sequence identifiers are stored within a descriptor 

so file of the present invention so that true and accurate 
copies of those collected files can be identified and/or 
verified when a collection is reconstructed or validated. 
Further, a reliably unique descriptor file identifier (or de- 
scriptor file hash) is generated for the descriptor file to 

55 serve as a representation of the collection of files. 
[0027] The present invention makes it possible to in- 
spect any collection of digital assets to establish wheth- 
er each asset in the collection is or is not present on a 
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particular computer system or network without having 
to provide a reference copy of the entire asset or relying 
on potentially misleading extrinsic naming or locational 
information. In this way, only those assets that can be 
proven to be missing from the system or network need 
to be obtained from other sources in order to reproduce 
the collection of assets. The preferred embodiment of 
the present invention primarily deals with digital assets 
which are data files. Appropriate interfaces would make 
it simple to extend the preferred embodiment of the in- 
vention to work equally well with digital assets which 
were records from structured files and databases of all 
types, selections or clips from streams of multimedia da- 
ta (digital audio or digital video, for example) or selec- 
tions or subsets of other structured or unstructured dig- 
ital data (binary sequences). 

[0028] A preferred embodiment of the present inven- 
tion uses one of a class of cryptographic hash functions 
that uses the contents of a digital asset to produce a 
unique binary number by mathematical and/or logical 
operations. Such functions are commonly used in en- 
cryption of digital information and an extensive body of 
art exists documenting alternative methods for generat- 
ing such a unique binary number for virtually any specific 
combination of digital data. A preferred embodiment us- 
es the well-known MD5 algorithm. It should be recog- 
nized that numerous other algorithms for generating re- 
liably unique asset identifiers may be used as well. Pref- 
erably, an algorithm should consistently produce the 
same binary number for any specific instance of a digital 
file and such a binary number should be practically prov- 
en to be unique with a reasonably high probability for 
the class of binary files being identified. With such an 
algorithm it could be proven that cryptographic hashes 
over two binary sequences that result in the same cryp- 
tographic hash (binary number) prove that the two bina- 
ry sequences are the same. Conversely, cryptographic 
hashes over two binary sequences that result in different 
cryptographic hashes (binary numbers) prove that the 
binary sequences are different. Such an algorithm sim- 
plifies the identification of copies of a particular binary 
sequence. 

[0029] A user-defined collection of digital assets, re- 
lated meta data and context information are grouped to 
produce a descriptor file. One example of a descriptor 
file is shown in FIG. 2. Hence, the descriptor file can be 
characterized as a box or list of keys to digital assets; 
in addition, it contains other information about those as- 
sets. This box of keys is then treated as an independent 
digital asset, and its own key is then derived from its 
unique content. The resulting key is the "key to the box 
of keys" and may be used to form an e-CLIP that repre- 
sents the collection of digital assets. 
[0030] A user or system can obtain an e-CLIP from 
any trusted source. The e-CLIP can then be used to find 
or identify a precise copy of the descriptor file that in turn 
further includes the collection of asset information. Once 
a copy of an original asset is found using the present 



invention, that asset can safely be treated as a precise 
copy of the original asset. If the asset is a descriptor file, 
it can be read or opened and the cryptographic hash 
binary sequence identifiers for the collection of digital 

5 assets can be obtained. The files corresponding to 
those binary sequence identifiers may be obtained and 
verified by a comparison of the provided binary se- 
quence identifiers with binary sequence identifiers new- 
ly derived using the cryptographic hash function. 

10 [0031] If the files identified in a descriptor file cannot 
be found, then the collection of files cannot be recon- 
structed. This is a potentially frustrating fact. Neverthe- 
less, the described method provides a mechanism by 
which collections of files can be reproduced reliably or 

is can be proven to be unavailable with equal reliability. 
[0032] In a preferred embodiment, a descriptor file is 
created by generating a cryptographic hash binary se- 
quence identifier for each digital asset in a selected col- 
lection of digital assets. The cryptographic hash binary 

20 sequence identifier is generated by using a cryptograph- 
ic hash function on the actual data content of each of 
the assets. In some embodiments the entire asset is 
used to generate the cryptographic hash binary se- 
quence identifier, and in other embodiments, a portion 

25 of the asset is used. Preferably, a sufficiently large por- 
tion is used to ensure a probability that the cryptographic 
hash binary sequence identifier is unique. In different 
embodiments, different cryptographic hash functions 
are used. In a preferred embodiment, the MD5 algorithm 

30 is used to generate a 128-bit number that represents the 
file. The 128-bit number is represented as a 26-charac- 
ter alphanumeric string by translation to base 36 num- 
bers that are then mapped to the set of alphabetic and 
numeric characters in the base ASCII character set. In 

35 the preferred embodiment, a flag character is included 
at a predetermined position within the resulting string 
bringing the total length of the string to 27 characters. 
This mapping is referred to as "ASCII Armoring" and is 
commonly used to render binary information in a limited 

40 character set for transmission over protocols that re- 
quire content to be constrained to alphanumeric binary 
coding. 

[0033] As is described below, this particular represen- 
tation of a cryptographic hash binary sequence identifier 

45 has the advantage of being human readable and easily 
communicated for use, e.g., by being written down, 
transmitted by software, retrieved by data query, coded 
into software application file requests, referenced by a 
content or asset management system, requested in an 

50 object browser, electronically copied and pasted from 
one document to another, sent via electronic mail, etc. 
[0034] A cryptographic hash function such as the 
MD5 algorithm is used in one embodiment to generate 
the cryptographic hash binary sequence identifier be- 

55 cause cryptographic hash functions have been mathe- 
matically proven to minimize the probability that similar 
assets will be mapped onto the same cryptographic 
hash binary sequence identifier. This is important be- 
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cause the cryptographic hash binary sequence identifier 
is used as a unique assets identifier and the generation 
of the same cryptographic hash binary sequence iden- 
tifier from two assets is assumed to conclusively show 
that the assets are identical. Conversely, it is equally 
useful to note binary sequences which are not the same 
will produce different binary sequence identifiers and 
such results can conclusively show two binary sequenc- 
es are not identical. The MD5 algorithm produces a high 
confidence level and is thus highly reliable as a tech- 
nique for producing a unique assets identifier. 
[0035] Other hash functions or other types of func- 
tions based on the binary sequence (content) may be 
used to generate assets identifiers so long as the prob- 
ability of generating identical identifiers from different 
files is below a threshold that is defined as acceptable. 
[0036] Once the cryptographic hash file identifier for 
each of the selected assets is generated, it is included 
in a descriptor file along with other assets information in 
a descriptor file. The other assets information included 
with the file list may include directory information about 
how the assets are organized within a computer system, 
as well as file names, file sizes, time and date stamps 
for each assets, ownership of the asset, and other asset 
meta data as is described below. The descriptor file may 
also include data about the context or implications of the 
collection of assets, the purposes for which the collec- 
tion is being created, or any other information. Then, in 
a preferred embodiment, the descriptor file is stored in 
a digital file in a suitable form for the type of computer 
system or other environment in which the descriptor file 
resides. In other embodiments, the descriptor file might 
be stored in a database or other digital repository pro- 
viding convenient, efficient, and secure storage and re- 
trieval capabilities. A cryptographic hash binary se- 
quence identifier for the stored descriptor file is then 
computed which, in one embodiment, becomes the 
e-CLIP by which the collected assets may be refer- 
enced, found, and verified. It should be recognized that 
the e-CLIP that identifies the collection may be pro- 
duced by the same algorithm used to compute the cryp- 
tographic hash binary sequence identifiers for the indi- 
vidual assets named and listed within the descriptor file. 
In other embodiments, the binary sequence identifier for 
the descriptor file is combined with other information 
(such as a file locator) to form the e-CLIP. 

e-CLIP GENERATION 

[0037] FIG. 1 is a flowchart illustrating a process for 
creating a cryptographic hash binary sequence identifier 
of a list of assets, including meta data and context data 
for the assets in the list. In step 102, a list of assets which 
are to be represented is selected and the asset data, 
meta data, and/or context data is collected. The list of 
assets may include multiple assets, only one asset, or 
no asset. The list of assets may even include previously 
created descriptor files or assets that include an e-CLIP. 



If a descriptor file contains no digital assets (files, data- 
base records, multimedia clips, etc.), then the descriptor 
file to be created may contain other data that is used to 
locate and obtain digital assets using a selected scheme 

5 or may contain valuable collections of meta data and 
context data without reference to independent binary 
sequences. In such a case, a cryptographic hash binary 
sequence identifier for the descriptor file still ensures the 
integrity of the data in the descriptor file. 

10 [0038] When at least one asset is selected, in step 
104 a cryptographic hash binary sequence identifier is 
generated for each of the assets selected. As noted 
above, in one embodiment, the MD5 algorithm is used 
to generate the cryptographic hash binary sequence 

15 identifier. Thus, a cryptographic binary sequence hash 
is used as an asset identifier for each of the assets. In 
step 106, a descriptor file is created using the meta data 
associated with each asset, meta data about the assets, 
and context data about the collection, and the crypto- 

20 graphic hash binary sequence identifiers generated in 
step 104. An example of a descriptor file is shown in 
FIG. 2 below. 

[0039] In step 108, a cryptographic hash is generated 
of the descriptor file itself. Each of the cryptographic 

25 hash binary sequence identifiers in the descriptor file 
may be thought of as a key to the digital asset which the 
cryptographic hash file identifier identifies. Thus, the de- 
scriptor file can be thought of as a collection or "box" of 
keys. The cryptographic hash binary sequence identifier 

30 of the descriptor file is referred to as a cryptographic 
hash binary sequence list identifier and can be thought 
of as the key to the box of keys that are listed in the 
descriptor file. The cryptographic hash binary sequence 
list identifier is used to locate and verify the descriptor 

35 file. The contents of the descriptor file are then in turn 
used to locate and verify each of the assets represented 
in the descriptor file. In step 1 1 0, the cryptographic hash 
binary sequence list identifier is converted to ASCII for- 
mat. As noted above, in one embodiment, a 128-bit file 

^0 list identifier is converted to a 27-character base 36 AS- 
CII string. The 27-character string is thus in human read- 
able text form and may be copied manually or electron- 
ically for processing, reference or storage. 
[0040] In one embodiment, the cryptographic hash bi- 

45 nary sequence list identifier is stored as a bar code. This 
is particularly useful when identifying information about 
an object is placed on the object itself. For example, an 
appliance such as a microwave or a VCR could have a 
bar code placed on the appliance that represents a cryp- 

50 tographic hash binary sequence list identifier that was 
generated from assets that include the manual or other 
documentation related to the appliance. Thus, the man- 
ual and related documentation can be obtained by: 
scanning the bar code; reading the descriptor file iden- 

55 tifier; obtaining the descriptor file that corresponds to the 
descriptor file identifier; reading the descriptor file and 
the individual binary sequence identifiers within it; ob- 
taining the digital assets that correspond to the binary 
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sequence identifiers; and finally, reading the obtained 
assets that contain the manual and related documenta- 
tion. Similar coding in other indices or software applica- 
tions can be used to specify, search for, and acquire oth- 
er digital assets containing data or software code. 
[0041] In step 112, the ASCII string is stored as a 
unique identifier or e-CLIP. The unique identifier is easily 
read or copied by either human or electronic means. 
Next, in step 114, in certain embodiments, the unique 
identifier can be combined with a file locator (as an 
e-CLIP hint) to form the e-CLIP. The file locator indicates 
a possible location of the purported descriptor file and 
associated digital assets (binary sequences). It should 
be noted that in many embodiments, e-CLIPs do not 
need to include a descriptor file locator (e-CLIP hint). 
However, the inclusion of a descriptor file locator as the 
place where the descriptor file may be found is beneficial 
in many instances, and especially if the descriptor file 
or one or more associated digital assets is not found in 
a convenient location first. 

[0042] Thus, the e-CLIP is represented by a unique 
identifier which, in one embodiment, is a human reada- 
ble version of a cryptographic hash binary sequence list 
identifier. The cryptographic hash binary sequence list 
identifier is a unique reference to information of arbitrary 
size, type, complexity, and file structure. That is, the 
cryptographic hash binary sequence list identifier may 
represent any number of digital assets further described 
by any amount of relevant meta data about file system 
structures, database relationships, multimedia content 
information, or other useful information. An example of 
a directory structure specified in a descriptor file is 
shown in FIG. 2. 

[0043] FIG. 2 is a diagram illustrating the structure of 
a descriptor file 200. The particular descriptor file shown 
uses a "hyperfile" modeling language (HFML) based on 
XML to describe the structure of the directories contain- 
ing files as well as the files themselves. An HFML is de- 
scribed in the provisional patent application referenced 
above. In general, it should be noted that implementa- 
tion of an e-CLIP is not restricted to a descriptor file writ- 
ten in this syntax. The HFML in the preferred embodi- 
ment is used because it is readily parsed and can be 
used to generate a tree-structured directory of the files 
and keys contained in the descriptor file. This example 
restricts itself to a description of files and keys from a 
particular form of computer and software. The invention 
provides for extension of the languages or codes used 
to create descriptor files to describe virtually any digital 
asset, relationships, and other meta and context data 
without limitation. 

[0044] The first item in descriptor file 200 is a folder 
202. A folder name 204 as well as a time stamp 206 are 
included in folder 202. Folder 202 matches up with an 
end folder tag 208 that marks the end of folder 202. 
Nested inside of folder 202 is a first nested folder 212. 
Folder 21 2 includes a folder name 21 4 and a time stamp 
216. A file 222 is included inside of folder 212. File 222 



includes a file name 224, a time and date stamp 226, a 
size 228, and a cryptographic hash file identifier 230 
generated by the MD5 algorithm and represented as a 
27-character string. Likewise, folder 212 also includes 
s a file 232. File 232 includes a file name 234, a time and 
date stamp 236, a size 238, and a cryptographic hash 
file identifier 240. Folder 212 matches with an end folder 
tag 219. 

[0045] It should be evident that an arbitrary number 

10 of folders can thus be represented and nested within 
other folders as desired, so that an arbitrary tree-shaped 
directory can be specified with an arbitrary number of 
files specified in each of the folders of the directory. Each 
of the files may include a file name and other meta data 

15 as desired plus a cryptographic hash binary sequence 
identifier that uniquely identifies the file based on the 
content of the file. In some embodiments, the crypto- 
graphic hash binary sequence identifier is the only iden- 
tifier for the file; in other embodiments a conventional 

20 file name is also associated with the file. 

[0046] Thus, it will be appreciated that when the de- 
scriptor file specified by an e-CLIP is recovered (for ex- 
ample, after a computer crash) and where the descriptor 
file contains definitions of computer files (possibly in ad- 

25 dition to other digital assets), complete file name and 
directory information for the files that are specified by 
the e-CLIP is obtained. The process of retrieving the de- 
scriptor file and finding the files specified in the descrip- 
tor file is described in FIG. 3 below. 

30 

FILE RETRIEVAL 

[0047] FIG. 3 is a flowchart illustrating a process for 
using an e-CLIP to find both a descriptor file and the 

35 digital assets (binary sequences) which, in this example, 
are files specified in the descriptor file and for putting 
the files in the directory structure specified by the de- 
scriptor file. In step 302 an e-CLIP is received. The 
e-CLIP may be received embedded in an e-mail mes- 

40 sage where the e-CLIP is being used by a user to specify 
a set of files. Alternatively, the e-CLIP may be generated 
automatically by a network device performing the back- 
up of the files and directories specified in the e-CLIP. 
The e-CLIP may be produced by a business applica- 

45 tions, sealing the relevant digital assets relating to a par- 
ticular transaction. In addition, e-CLIPs may be gener- 
ated for other reasons by any user, network node, ap- 
plication or hardware device that needs to uniquely 
specify a file or group of files for some purpose. Such 

50 e-CLIPs may be embedded in and readily accessed 
from database applications, legacy applications running 
on mainframes, text retrieval applications, web sites, 
etc. 

[0048] In step 304 the recipient of the e-CLIP broad- 
55 casts a request for the file corresponding to the unique 
identifier found in the e-CLIP. Next, in step 306 the 
e-CLIP recipient receives a descriptor file purporting to 
correspond to the unique identifier. Next, in step 308 the 
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recipient calculates the cryptographic hash of the de- 
scriptor file received using the same cryptographic hash 
function that was used to generate the unique identifier 
found in the e-CLIP. In step 310 the recipient verifies 
that the unique identifier found in the e-CLIP matches 
the result of the cryptographic hash of the descriptor file. 
If the unique identifier is not properly verified, then con- 
trol is transferred back to step 304 and the request for 
the file identified in the e-CLIP is sent again. An error 
message or other notification may be generated as well. 
[0049] If the unique identifier is verified in step 310, 
then control is transferred to step 312 and the recipient 
builds the directory structure specified in the descriptor 
file. Programming logic is applied to perform system 
configurations and file operations to create the required 
directories, using programming operations such as 
those described by HFML, for example. Preferably, 
each folder in the directory is created according to the 
specified structure. Next, in step 314 the recipient of the 
e-CLIP broadcasts a request for the files listed in the 
descriptor file. FIG. 6A shows an example structure for 
a file request. 

[0050] Responses offering copies of the requested 
files are analyzed and copies of the files are retrieved 
from the most effective sources available including local 
file systems, local networked file systems available to 
the system on which the recipient is executing, standard 
networking protocols such as the File Transfer Protocol 
(FTP), or through any other networked protocol as may 
be devised or specified. 

[0051] In step 31 6 the recipient of the e-CLIP receives 
the files requested and verifies the file contents by gen- 
erating cryptographic hashes of the file data and com- 
paring the results to the file identifiers listed in the de- 
scriptor file. If any files fail the verification test, then 
those files are requested again and an appropriate no- 
tification is generated. The process then ends. 
[0052] Thus, a recipient of an e-CLIP broadcasts a re- 
quest for the descriptor file identified by the unique iden- 
tifier in the e-CLIP. Once the descriptor file is received, 
the e-CLIP recipient is able to verify that the correct de- 
scriptor file has been recovered and then broadcasts re- 
quests for the files specified in the descriptor file. Those 
files are inserted into the directory structure specified in 
the descriptor file once they are received and verified. 
The process for broadcasting requests for files, receiv- 
ing and verifying files, and modifying the broadcast re- 
quest is accomplished in one embodiment using an im- 
porter, which is a small program encoded preferably in 
the JAVA programming language, or in any other suita- 
ble language. 

[0053] FIG. 4 is a flowchart illustrating a process run- 
ning on an importer used to receive requested digital 
assets (binary sequences) which may be files and to 
verify their file identifiers as is specified in step 316 of 
FIG. 3. It should be noted that other processes and lan- 
guages to request and verify such file identifiers may be 
used within the spirit and scope of the invention. In step 



402 the importer waits to receive files. When a file is 
received, control is transferred to step 404. The process 
of receiving a file in parts and assembling those parts is 
further described in FIG. 5. In step 404, the importer ver- 

5 ifies that the cryptographic hash of the file received 
matches the file identifier that was sent out requesting 
the file. If the file identifier is not verified, then control is 
transferred to step 406 where an error handler is acti- 
vated. Then, in step 408 a request for the entire file is 

10 generated and control is transferred back to step 402. 
[0054] If, however, the file is verified, then the file re- 
quest list for broadcasts is updated in step 410 and con- 
trol is transferred to step 41 2. In step 41 2 it is determined 
whether all the files have been received that were spee- 
ds jfied in the descriptor file identified in the e-CLIP. If all 
files have been received, then control is transferred to 
step 414 and it is indicated that all of the e-CLIP files 
have been obtained. The process then ends. As long as 
all of the files have not been received, control transfers 

20 from step 412 back to step 402 so that the rest of the 
files may be received and checked. 
[0055] It must be noted that the examples used in this 
description all make reference to files. The assets, how- 
ever, may be references to database records, video 

25 clips taken from within larger video streams, or other dig- 
ital assets stored to be passed to other software pro- 
grams or processes. Rather than instantiating directo- 
ries and creating files with the contents of the digital as- 
sets, the recipient would make them available via some 

30 other standard application programming interface. The 
process of requesting the assets, verifying their con- 
tents, keeping them in a temporary, quarantine, holding 
area, and making the collection available after all assets 
are received and verified is logically the same for any 

35 collection of digital assets no matter their type, or 
source. 

[0056] This type of digital asset quarantine and veri- 
fication procedure using content-addressable asset 
identifiers ensures asset integrity, excludes spoofing 

40 and virus infection automatically, and permits automat- 
ed or manual reconstruction of lost assets. The content- 
addressable asset identifier system of the present in- 
vention is superior to other asset identification systems 
where identifiers are not derived from the digital asset 

45 contents but instead depend on a path name locator, file 
name, file author, file creation/modification date, file 
size, or other environmental or application meta-data. 
Because such prior art identifiers are not content ad- 
dressable, they may be readily spoofed. By contrast, the 

50 verify step 404 of the present invention allows errors in 
assets to be detected. Advantageously, under the 
present invention, if errors are detected and a virus or 
spoofing is suspected, suspect assets may be reconsti- 
tuted from another location or a more secure method of 

55 file transfer may be invoked. 

[0057] FIG. 5 is a flowchart illustrating one embodi- 
ment of step 402 showing how the importer checks "mul- 
ticast" transmissions in order to reconstitute assets that 
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are received in portions. Multicast transmissions are 
transmissions from a peer that are addressed to all 
peers available on the network. Similarly, a multicast re- 
quest may be sent by sending a request to all peers 
available on the network. Peers include any device in- 5 
eluded in a defined multicast group; a multicast group 
may include any device accessible over a data link. This 
method is referred to as "Swiss cheese" method be- 
cause it fills in assets by placing the chunks of an asset 
in he proper order and continues to request chunks that 10 
are needed to fill in the holes. The method permits mul- 
tiple source, variable, nonsequential digital asset seg- 
ment transfer in response to a request using a content- 
addressable asset name (such as a cryptographic hash 
asset identifier). Of course, other methods for receiving *5 
files may also be used. 

[0058] In step 510 the importer receives a multicast 
transmission. Next, in step 520 the importer checks its 
asset request list (akin to a shopping list) to see if a dig- 
ital asset portion or segment received is needed. If the 20 
asset segment is not needed, control is transferred back 
to step 510. If the asset segment is needed, then the 
data is stored in the proper order based on its sequence 
number in step 530 and the asset request list is updated 
so that that particular asset segment will not be request- 25 
ed any longer. The process then ends. 
[0059] It should be noted in the above described pro- 
tocol that digital assets (binary sequences) are received 
in parts as portions or segments and that the asset re- 
quest list includes all of the assets that are being re- 30 
quested until those assets (binary sequences) are re- 
ceived in their entirety. In other embodiments, assets 
may be received whole or in a manner specified by any 
file transfer protocol. It is also possible that in some em- 
bodiments, an asset segment request list would be im- -35 
plemented that would include individual segments being 
requested. For example, individual segments of assets 
may be requested when data or a code patch for a soft- 
ware application is required, or when specific entries for 
a database are obtained by a store or query result. The *o 
importer manages the transfer of assets to the recipient 
of an e-CLIP and determines when the assets are com- 
plete so that the cryptographic hash file identifier spec- 
ified in the descriptor file may be used to verify that the 
correct asset has been received. Verification is achieved 45 
by comparing the cryptographic hash asset identifier to 
a newly generated MD5 cryptographic hash asset iden- 
tifier calculated using the received asset (binary se- 
quence). 

[0060] FIG. 6A is a block diagram illustrating the struc- 50 
ture of an asset request generated by an importer as 
described above in step 402. A request 600 includes an 
asset identifier 602, a sequence number 604, and a 
chunk size 606. The asset identifier is obtained from the 
descriptor file. The sequence numbers may be generat- 55 
ed by the importer based on the size of the asset seg- 
ments that it will request. The chunk size is specified by 
the importer in certain embodiments. It should be noted 



that in other embodiments, the chunk size is specified 
by the system and is not changeable by individual file 
importers. 

[0061] FIG. 6B is a block diagram illustrating the struc- 
ture of a data packet that delivers binary asset data to 
a requester in response to an asset request. A data 
packet 61 0 includes an asset identifier 61 2, a sequence 
number 614, and data 616 which represents the asset 
segment data itself. The length of the data corresponds 
to the length of the chunk size 606 specified in the re- 
quest 600. Thus, incoming asset segments can be or- 
dered according to their sequence number and the data 
in the file can be recovered from the ordered segments 
once all of the segments have been received. Notably, 
portions may be received from different sources in non- 
sequential order and concatenated or filled in to create 
the target digital asset (binary sequence). 
[0062] In one embodiment, the importer has a specific 
hierarchy of locations in a computer system (or on a net- 
work) in which it looks for the assets listed in the de- 
scriptor file. Thus, the importer may be implemented us- 
ing a chained system of importers which look for digital 
assets in different places. 

[0063] FIG. 7 is a block diagram illustrating one such 
chained set of importers. A verifier importer 702 at- 
tempts to first verify that the digital asset is stored on a 
local disk in asset storage. Asset storage is an area of 
local memory reserved for storing data in a binary form 
in a way optimized for instant retrieval using a crypto- 
graphic hash file identifier. If the verifier importer finds 
the digital asset in asset storage, then the verifier checks 
the cryptographic hash asset identifier by calculating it 
and then verifies that the asset in asset storage is actu- 
ally the asset being requested. 
[0064] If the verifier importer is not able to find the dig- 
ital asset in asset storage, then a find importer 704 is 
enabled to locate the asset (if a file) in local conventional 
storage, if possible. If the asset is a file and is not found 
in local conventional storage or is some other form of 
digital asset, then a multicast importer 706 such as the 
one described above in FIG. 5 is enabled to broadcast 
signals within the multi-cast group of the recipient of the 
e-CLIP to attempt to obtain the assets specified by the 
e-CLIP. If the assets can not be obtained by the multi- 
cast importer, then a copy importer 708 is used to look 
for the asset stored as a content-addressable file on any 
mounted volumes on file servers that are accessible to 
the copy importer. If the assets still are not found, then 
a download importer 710 is used to download the asset 
stored as a content-addressable file from an FTP server 
or some other outside source. As noted above, certain 
e-CLIPs may include a resource locator such as a URL 
that specifies a specific outside location where the as- 
sets stored as files included in the e-CLIP may be found 
if they are not obtainable by any of the other importers 
above importer 71 0. Alternatively, a traditional file trans- 
fer request can be used. 

[0065] Thus, the importers are in a hierarchy and as- 
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sets are searched for first in the most convenient loca- 
tion and then in progressively less convenient locations. 
This "assembly line" of importers is configurable in kind 
and quantity of importers and may automatically and dy- 
namically change to optimize economy, security or per- 5 
formance. Because the cryptographic hash asset iden- 
tifiers serve as content-based file names that enable the 
content of assets to be verified once the assets are re- 
covered, it is possible to allow assets to be recovered 
from arbitrary locations where they may be found with- 10 
out regard to checking the contents of the asset using 
some sort of check sum. Advantageously, the crypto- 
graphic hash asset identifier acts as both a digital asset 
(binary sequence) identifier and a means for verifying 
the asset contents. 15 

CONCLUSION 

[0066] In one embodiment, a system and method has 
been described for specifying a collection of files having 20 
an arbitrary directory structure to be reconstructed from 
whatever sources are available to the target system. 
The files are described in a directory structure in a de- 
scriptor file and a cryptographic hash file list identifier 
(e-CLIP) is generated for the descriptor file. The e-CLIP 25 
represents the collection of files and may then be trans- 
ferred, stored, etc. 

[0067] When the descriptor file is obtained by the re- 
cipient using the e-CLIP, the descriptor file is verified as 
the correct descriptor file specified by the e-CLIP using 30 
the same algorithm that was used to generate the cryp- 
tographic hash file list identifier. Then, each of the files 
specified in the descriptor file are recovered using an 
importer and the files are verified using the cryptograph- 
ic hash file identifiers for each file. Thus, information is 35 
obtainable by a recipient using means more efficient 
than simply receiving all of the information over one 
communication line from the information sender's loca- 
tion. The information may be reliably gathered by the 
recipient of the e-CLIP because the e-CLIP contains <o 
cryptographic hash file identifiers for each file that are 
used to verify the contents of the files. 
[0068] In addition to specifying files for transfer from 
one entity to another, the e-CLIP described herein can 
also be used to create a record of the exact state of any 45 
collection of files in a computer at any given time. This 
is done by generating an e-CLIP that is a cryptographic 
hash file identifier of a descriptor file that includes direc- 
tory information for that collection of files in the compu- 
ter. Preferably, all of the files are first backed up else- 50 
where for later retrieval if necessary. If the computer files 
are lost for any reason, the e-CLIP is used to retrieve 
the descriptor file (which has been stored in a safe lo- 
cation). The descriptor file can then be used to retrieve 
all of the files that are referenced within it, either a locally 55 
or over a network. Preferably, the importers described 
herein are used to retrieve the files. 
[0069] Thus, the state of the files in the computer may 



be recorded exactly by simply generating an e-CLIP for 
the files, storing the e-CLIP safely, and making sure cop- 
ies of the files exist elsewhere. The files may be recov- 
ered if needed by retrieving the e-CLIP, using the e-CLIP 
to find the descriptor file, opening the descriptor file, and 
then using the importers to retrieve the correct versions 
of all of the files represented therein. 
[0070] This is an efficient way to back up multiple 
computers on a network when many of the computers 
contain the same files. Each computer on the network 
generates a descriptor file describing all of its files as 
well as its directory structure. The descriptor file is sent 
to a central backup computer that makes certain that it 
contains all of the files specified in each of the descriptor 
files. The amount of data compression achieved by this 
scheme can be extremely large when the computers be- 
ing backed up contain many common files as is the case 
with Personal Computers on Local Area Networks 
(LANs). Each file need only be obtained and stored once 
by the central backup computer and then specified as 
many times as a necessary in the individual descriptor 
files that represent the files found on the individual com- 
puters. 

[0071] Subsequent backups of the same computers 
may be accomplished by generating a new descriptor 
file which includes file hashes for the new or changed 
files, and having the central computer store the new de- 
scriptor file and all of the newly created or changed files. 
Thus, new or changed files may be reliably identified 
and copied to the central backup computer without mov- 
ing previously archived, unmodified files. Storing peri- 
odic backups for each computer can thus be accom- 
plished without requiring prohibitive amounts of file stor- 
age since each new backup only requires additional 
storage for new or changed files. 
[0072] Similar mechanisms will be embodied which 
perform archiving of individual records of databases, 
Web pages, and/or any other digital assets which may 
be identified by some process and made available to 
mechanisms identical in nature to those described 
above. 

[0073] Although the foregoing invention has been de- 
scribed in some detail for purposes of clarity of under- 
standing, it will be apparent that certain changes and 
modifications may be practiced within the scope of the 
appended claims. For example, a wide variety of algo- 
rithms may be used to compute a unique file identifier 
for an asset, and for the descriptor file. Also, the asset 
list hash may be represented in any suitable form such 
as binary, another suitable base, ASCII, alphanumeric, 
etc. The e-CLIP, descriptor file and individual assets 
may be stored in the same location or in different loca- 
tions. Many different forms may be used for the descrip- 
tor file; it may contain solely the asset hash for each dig- 
ital asset (binary sequence), or a wide variety of other 
information. Assets may be retrieved using the import- 
ers described herein, or using other techniques. Assets 
may be verified only if there is an exact match, or some 
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room for error to allow for minor changes in files may 
also be acceptable. Accordingly, the present embodi- 
ments are to be considered as illustrative and not re- 
strictive, and the invention is not to be limited to the de- 
tails given herein, but may be modified within the scope 5 
and equivalents of the appended claims. 

COMPUTER SYSTEM EMBODIMENT 

[0074] FIGS. 8 and 9 illustrate a computer system 900 10 
suitable for implementing embodiments of the present 
invention. FIG. 8 shows one possible physical form of 
the computer system. Of course, the computer system 
may have many physical forms ranging from an integrat- 
ed circuit, a printed circuit board and a small handheld 15 
device up to a huge super computer. Computer system 
900 includes a monitor 902, a display 904, a housing 
906, a disk drive 908, a keyboard 910 and a mouse 912. 
Disk 914 is a computer-readable medium used to trans- 
fer data to and from computer system 900. 20 
[0075] FIG. 9 is an example of a block diagram for 
computer system 900. Attached to system bus 920 are 
a wide variety of subsystems. Processor(s) 922 (also 
referred to as central processing units, or CPUs) are 
coupled to storage devices including memory 924. 25 
Memory 924 includes random access memory (RAM) 
and read-only memory (ROM). As is well known in the 
art, ROM acts to transfer data and instructions uni-di- 
rectionally to the CPU and RAM is used typically to 
transfer data and instructions in a bi-directional manner. 30 
Both of these types of memories may include any suit- 
able of the computer-readable media described below. 
A fixed disk 926 is also coupled bi-directionally to CPU 
922; it provides additional data storage capacity and 
may also include any of the computer-readable media 35 
described below. Fixed disk 926 may be used to store 
programs, data and the like and is typically a secondary 
storage medium (such as a hard disk) that is slower than 
primary storage. It will be appreciated that the informa- 
tion retained within fixed disk 926, may, in appropriate *o 
cases, be incorporated in standard fashion as virtual 
memory in memory 924. Removable disk 914 may take 
the form of any of the computer-readable media de- 
scribed below. 

[0076] CPU 922 is also coupled to a variety of input/ 45 
output devices such as display 904, keyboard 910, 
mouse 912 and speakers 930. In general, an input/out- 
put device may be any of: video displays, track balls, 
mice, keyboards, microphones, touch-sensitive dis- 
plays, transducer card readers, magnetic or paper tape so 
readers, tablets, styluses, voice or handwriting recog- 
nizers, biometrics readers, or other computers. CPU 
922 optionally may be coupled to another computer or 
telecommunications network using network interface 
940. With such a network interface, it is contemplated 55 
that the CPU might receive information from the net- 
work, or might output information to the network in the 
course of performing the above-described method 



steps. Furthermore, method embodiments of the 
present invention may execute solely upon CPU 922 or 
may execute over a network such as the Internet in con- 
junction with a remote CPU that shares a portion of the 
processing. 

[0077] In addition, embodiments of the present inven- 
tion further relate to computer storage products with a 
computer-readable medium that have computer code 
thereon for performing various computer-implemented 
operations. The media and computer code may be 
those specially designed and constructed for the pur- 
poses of the present invention, or they may be of the 
kind well known and available to those having skill in the 
computer software arts. Examples of computer-reada- 
ble media include, but are not limited to: magnetic media 
such as hard disks, floppy disks, and magnetic tape; op- 
tical media such as CD-ROMs and holographic devices; 
magneto-optical media such as floptical disks; and hard- 
ware devices that are specially configured to store and 
execute program code, such as application-specific in- 
tegrated circuits (ASICs), programmable logic devices 
(PLDs) and ROM and RAM devices. Examples of com- 
puter code include machine code, such as produced by 
a compiler, and files containing higher level code that 
are executed by a computer using an interpreter. 



Claims 

1. A method of identifying a plurality of digital assets 
for later retrieval, said method comprising: 

selecting (102) a plurality of digital assets; 

generating (104) a cryptographic hash asset 
identifier for each of said digital assets; 

generating (106) a descriptor file that includes 
said cryptographic hash asset identifiers; gen- 
erating (108) a cryptographic hash asset list 
identifier for said descriptor file; and storing 
said digital assets, said descriptor file and said 
cryptographic hash asset list identifier in secure 
locations, whereby said cryptographic hash as- 
set list identifier may be used at a later time to 
retrieve said assets. 

2. A method as recited in claim 1 further comprising: 

translating (110) said generated cryptographic 
hash asset list identifier into a human-readable 
alphanumeric form, whereby said translated 
cryptographic hash asset list identifier may be 
easily transferred and stored. 

3. A method as recited in claim 1 further comprising: 

identifying a file directory structure that is ar- 
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ranged to organize said files; and 

including said file directory structure and its re- 
lationship to said files in said descriptor file, 
whereby said generated cryptographic hash 5 
asset list identifier further identifies said file di- 
rectory 

structure of said files. 

10 

A method as recited in claim 1 further comprising: 

retrieving said cryptographic hash asset list 
identifier; and 

15 

signaling a request for said descriptor file iden- 
tified by said cryptographic hash asset list iden- 
tifier. 

A method as recited in claim 1 wherein each digital 20 
asset has associated meta data, and said element 
of creating includes: 

creating a descriptor file that includes said cryp- 
tographic hash asset identifiers and said meta 25 
data associated with said files, whereby said 
cryptographic hash asset list identifier may be 
used at a later time to retrieve said files and 
their associated meta data. 

30 

A method as recited in claim 1 wherein said crypto- 
graphic hash function is the MD5 algorithm. 

A method as recited in claim 1 further comprising: 

35 

associating said generated digital asset list 
identifier with a file locator that indicates a Po- 
tential location of said digital asset list; and 

transferring said generated digital asset list 40 
identifier along with said file locator to a secure 
location or a recipient user. 

A method as recited in claim 1 further comprising: 

45 

determining whether to retrieve said digital as- 
sets; 

retrieving said digital asset list identifier when 
it is determined to retrieve said digital assets; 50 
and broadcasting a request for said digital as- 
set list identified by said digital asset list iden- 
tifier. 

A method as recited in claim 7 further comprising: 55 

receiving said digital asset list including said 
digital asset identifiers; and 



verifying that said digital asset list is correct by 
regenerating said digital asset list identifier 
based upon said received digital asset list. 

10. A method as recited in claim 9 further comprising: 

submitting a request for said digital assets iden- 
tified by said digital asset identifiers in said re- 
ceived digital asset list; 

receiving said requested digital assets; and 

verifying that said digital assets are correct by 
regenerating a digital asset identifier for each 
of said digital assets. 

11. A method as recited in claim 1 wherein the crypto- 
graphic hash asset identifier is computed from a 
portion of the contents of said selected digital asset. 

12. A method as recited in claim 1 wherein said digital 
asset has associated meta data, said method fur- 
ther comprising: 

storing in said descriptor file said meta data as- 
sociated with said digital asset; and 

obtaining said descriptor file including said me- 
ta data, whereby said associated meta data is 
obtained along with said copy of said digital as- 
set. 

13. A method as recited in claim 1 for retrieving a plu- 
rality of desired digital assets from a location com- 
prising: 

receiving a digital asset list identifier that 
uniquely identifies a descriptor file correspond- 
ing to said desired digital assets; 

retrieving said descriptor file using said digital 
asset list identifier, said descriptor file including 
a digital asset identifier for each of said desired 
digital assets, each of said digital asset identi- 
fiers uniquely identifying one of said desired 
digital assets; 

retrieving a second plurality of digital assets us- 
ing said digital asset identifiers; and 

verifying that said retrieved digital assets cor- 
respond to said desired digital assets using 
said digital asset identifiers. 

14. A method as recited in claim 13 wherein said veri- 
fying includes: 

computing a new digital asset identifier for each 
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of said retrieved digital assets using the same 
cryptographic hash function used to compute 
said digital asset identifier for each of said de- 
sired digital assets; and 

5 

comparing said new digital asset identifiers of 
said retrieved digital assets to said digital asset 
identifiers of said descriptor file, whereby a new 
digital asset identifier that matches a digital as- 
set identifier for a given one of said digital as- w 
sets indicates that said digital asset is verified. 

15. A method as recited in claim 13 further comprising: 



reading a file directory structure arranged to or- 
ganize said desired files from said descriptor 
file; and 

placing said retrieved files into a file hierarchy 
similar to said file directory structure. 
16. A method as recited in claim 13 further comprising: 



signaling a request for said desired files identi- 25 
fied by said file identifiers over a computer net- 
work; and 



receiving said second plurality of files from lo- 
cations on said computer network. 

17. A method as recited in claim 13 further comprising: 

verifying that said descriptor file is correct by 
regenerating said file list identifier based upon 
said retrieved descriptor file. 

18. A method as recited in claim 1 3 further comprising: 



a descriptor file that includes a digital asset 
identifier for said digital asset, said digital asset 
identifier being the result of a cryptographic 
hash function based upon a portion of the con- 
tents of said digital computer asset; 

a descriptor file identifier that is the result of a 
cryptographic hash function based upon a por- 
tion of the contents of said descriptor file; 

an importer program arranged to accept said 
file identifier from said descriptor file and to re- 
trieve said computer file using said file identifi- 
er, whereby said computer file may be reliably 
identified as the file to be transferred. 

21. A system as recited in claim 20 wherein said cryp- 
tographic hash function is the MD5 algorithm. 



20 22. A system as recited in claim 20 further comprising: 



a plurality of computer files, and wherein said 
descriptor file includes a file directory structure 
that is arranged to organize said computer files, 
whereby said descriptor file identifier further 
identifies said file directory structure of said 
computer files. 
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23. A system as recited in claim 20 further comprising: 

a series of importer programs arranged to 
search for said computer file in progressively 
more remote locations. 

24. A system as recited in claim 20 wherein said de- 
scriptor file further includes: meta data associated 
with said file. 



retrieving said second plurality of files by using 40 
a plurality of importer programs arranged to 
search for said second plurality of files in pro- 
gressiven more remote locations. 



19. A method as recited in claim 13 wherein each of 
said files includes associated meta data and where- 
in said descriptor file includes said associated meta 
data along with each of said file identifiers, said 
method further comprising: 

obtaining said associated meta data for each of 
said files. 



20. A system for reliably transferring a digital computer 
asset comprising: 55 



45 



50 



a digital computer asset; 



Patentanspruche 

1. Ein Verfahren zum Identifizieren einer Mehrzahl di- 
gitaler Objekte zur spateren Wiedergewinnung, wo- 
bei das Verfahren folgende Schritte aufweist: 

Auswahlen (102) einer Mehrzahl digitaler Ob- 
jekte; 

Erzeugen (104) eines kryptographischen 
Hash-Objekt-ldentifizierers fur jedes der digita- 
len Objekte; 

Erzeugen (106) einer Beschreiberdatei, die die 
kryptographischen Hash-Objekt-ldentifizierer 
umfalit; 

Erzeugen (108) eines kryptographischen 
Hash-Objekt-Liste-ldentifizierers fur die Be- 
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schreiberdatei; und 

Speichern der digitalen Objekte, der Beschrei- 
berdatei und des kryptographischen Hash-Ob- 
jekt-Liste-ldentifizierers an sicheren Orten, wo- 5 
durch der kryptographische Hash-Objekt-Liste- 
Identifizierer zu einem spateren Zeitpunkt ver- 
wendet werden kann, urn die Objekte wieder- 
zugewinnen. 

10 

Ein Verfahren gemafi Anspruch 1, das ferner fol- 
genden Schritt aufweist: 

Umwandeln (110) des erzeugten kryptographi- 
schen Hash-Objekt-Liste-ldentifizierers in eine 1$ 
fur Menschen lesbare, alphanumerische Form, 
wodurch der umgewandelte kryptographische 
Hash-Objekt-Liste-ldentifizierer ohne weiteres 
ubertragen und gespeichert werden kann. 

20 

Ein Verfahren gemali Anspruch 1 , das ferner fol- 
gende Schritte aufweist: 

Identifizieren einer Dateiverzeichnisstruktur, 
die angeordnet ist, urn die Dateien zu organi- 25 
sieren; und 

Einschlieften der Dateiverzeichnisstruktur und 
ihrerBeziehung zu den Dateien in die Beschrei- 
berdatei, wodurch der erzeugte kryptographi- 30 
sche Hash-Objekt-Liste-ldentifizierer ferner die 
Dateiverzeichnisstruktur der Dateien identifi- 
ziert. 

Ein Verfahren gemaR Anspruch 1, das ferner fol- 35 
gende Schritte aufweist: 

Wiedergewinnen des kryptographischen Hash- 
Objekt-Liste-ldentifizierers; und 

40 

Signalisieren einer Anforderung nach der Be- 
schreiberdatei, die durch den kryptographi- 
schen Hash-Objekt-Liste-ldentifizierer identifi- 
ziert ist. 

45 

Ein Verfahren gemafi Anspruch 1 , bei dem jedem 
digitalen Objekt Metadaten zugeordnet sind, wobei 
das Element des Erzeugens folgenden Schritt um- 
faftt: 

50 

Erzeugen einer Beschreiberdatei, die die kryp- 
tographischen Hash-Objekt-ldentifizierer und 
die Metadaten umfafct, die den Dateien zuge- 
ordnet sind, wodurch der kryptographische 
Hash-Objekt-Liste-ldentifizierer zu einem spa- 55 
teren Zeitpunkt verwendet werden kann, urn 
die Dateien und ihre zugeordneten Metadaten 
wiederzugewinnen. 



6. Ein Verfahren gemafc Anspruch 1, bei dem der 
MD5-Algorithmus die kryptographische Hash- 
Funktion ist. 

7. Ein Verfahren gemali Anspruch 1 , das ferner fol- 
gende Schritte aufweist: 

Zuordnen des erzeugten Digitales-Objekt-Li- 
ste-ldentifizierers zu einem Dateilokalisierer, 
der einen potentiellen Ort der Digitales-Objekt- 
Liste anzeigt; und 

Ubertragen des erzeugten Digitales-Objekt-Li- 
ste-ldentifizierers gemeinsam mit dem Dateilo- 
kalisierer an einen sicheren Ort Oder einen 
empfangenden Benutzer. 

8. Ein Verfahren gemali Anspruch 1 , das ferner fol- 
gende Schritte aufweist: 

Bestimmen, ob die digitalen Objekte wiederge- 
wonnen werden sollen; 

Wiedergewinnen des Digitales-Objekt-Liste- 
Identifizierers, wenn bestimmt wird, dali die di- 
gitalen Objekte wiedergewonnen werden sol- 
len; und 

Rundsenden einer Anforderung nach der Digi- 
tales-Objekt-Liste, die durch den Digitales-Ob- 
jekt-Liste-ldentifizierer identifiziert ist. 

9. Ein Verfahren gemafi Anspruch 7, das ferner fol- 
gende Schritte aufweist: 

Empfangen der Digitales-Objekt-Liste, die die 
Digitales-Objekt-ldentifizierer umfaRt; und 

Verifizieren, dafi die Digitales-Objekt-Liste kor- 
rekt ist, durch ein Regenerieren des Digitales- 
Objekt-Liste-ldentifizierers basierend auf der 
empfangenen Digitales-Objekt-Liste. 

10. Ein Verfahren gemafc Anspruch 9, das ferner fol- 
gende Schritte aufweist: 

Vorlegen einer Anforderung nach den digitalen 
Objekten, die durch die Digitales-Objekt-lden- 
tifizierer in der empfangenen Digitales-Objekt- 
Liste identifiziert sind; 

Empfangen der angeforderten digitalen Objek- 
te; und 

Verifizieren, daR die digitalen Objekte korrekt 
sind, durch ein Regenerieren eines Digitales- 
Objekt-ldentifizierers fur jedes der digitalen 
Objekte. 
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11. Ein Verfahren gemafi Anspruch 1 , bei dem der kryp- 
tographische Hash-Objekt-ldentifizierer aus einem 
Abschnitt des Inhalts des ausgewahlten digitalen 
Objekts berechnet wird. 

12. Ein Verfahren gemafi Anspruch 1 , bei dem dem di- 
gitalen Objekt Metadaten zugeordnet sind, wobei 
das Verfahren ferner folgende Schritte aufweist: 

Speichern der Metadaten, die dem digitalen 
Objekt zugeordnet sind, in der Beschreiberda- 
tei; und 

Erhalten der Beschreiberdatei, die die Metada- 
ten umfalit, wodurch die zugeordneten Meta- 
daten gemeinsam mit der Kopie des digitalen 
Objektes erhalten werden. 

13. Ein Verfahren gemafi Anspruch 1, zum Wiederge- 
winnen einer Mehrzahl ewunschterdigitalerObjek- 
te von einem Ort, mit folgenden Schritten: 

Empfangen eines Digitales-Objekt-Liste-lden- 
tifizierers, der eine Beschreiberdatei, die den 
erwiinschten digitalen Objekten entspricht, ein- 
deutig identifiziert; 

Wiedergewinnen der Beschreiberdatei unter 
Verwendung des Digitales-Objekt-Liste-ldenti- 
fizierers, wobei die Beschreiberdatei einen Di- 
gitales-Objekt-ldentifizierer fur jedes der er- 
wiinschten digitalen Objekte umfalit, und wo- 
bei jeder der Digitales-Objekt-ldentifizierer ei- 
nes der erwiinschten digitalen Objekte eindeu- 
tig identifiziert; 

Wiedergewinnen einer zweiten Mehrzahl digi- 
taler Objekte unter Verwendung der Digitales- 
Objekt-ldentifizierer; und 

Verifizieren, daft die wiedergewonnenen digita- 
len Objekte den erwiinschten digitalen Objek- 
ten entsprechen, unter Verwendung der Digita- 
les-Objekt-ldentifizierer. 

14. Ein Verfahren gemafi Anspruch 13, bei dem das Ve- 
rifizieren folgende Schritte umfalit: 

Berechnen eines neuen Digitales-Objekt-lden- 
tifizierers fur jedes der wiedergewonnen digita- 
len Objekte unter Verwendung der gleichen 
kryptographischen Hash-Funktion, die verwen- 
det wird, urn den Digitales-Objekt-ldentifizierer 
fur jedes der erwiinschten digitalen Objekte zu 
berechnen; und 

Vergleichen der neuen Digitales-Objekt-ldenti- 
fizierer der wiedergewonnenen digitalen Ob- 



jekte mit den Digitales-Objekt-ldentifizierern 
der Beschreiberdatei, wodurch ein neuer Digi- 
tales-Objekt-ldentifizierer, der mit einem Digi- 
tales-Objekt-ldentifizierer fur ein bestimmtes 
5 der digitalen Objekte ubereinstimmt, anzeigt, 

dad das digitale Objekt verifiziert ist. 

15. Ein Verfahren gemafi Anspruch 13, das ferner fol- 
gende Schritte aufweist: 

10 

Lesen einer Dateiverzeichnisstruktur, die ange- 
ordnet ist, urn die erwiinschten Dateien zu or- 
ganisieren, von der Beschreiberdatei; und 

15 Plazieren der wiedergewonnenen Dateien in 

eine Dateihierarchie, die der Dateiverzeichnis- 
struktur ahnelt. 

16. Ein Verfahren gemafi Anspruch 13, das ferner fol- 
20 gende Schritte aufweist: 

Signalisieren einer Anforderung nach den er- 
wiinschten Dateien, die durch die Dateiidentifi- 
zierer identifiziert sind, uber ein Computernetz; 
25 und 

Wiedergewinnen der zweiten Mehrzahl von 
Dateien von Orten auf dem Computernetz. 

30 17. Ein Verfahren gemafi Anspruch 13, das ferner fol- 
genden Schritt aufweist: 

Verifizieren, dali die Beschreiberdatei korrekt 
ist, durch ein Regenerieren des Datei-Liste- 
35 Identifizierers basierend auf der wiedergewon- 

nenen Beschreiberdatei. 

18. Ein Verfahren gemafi Anspruch 13, das ferner fol- 
genden Schritt aufweist: 

AO 

Wiedergewinnen der zweiten Mehrzahl von 
Dateien durch ein Verwenden einer Mehrzahl 
von Importierprogrammen, die angeordnet 
sind, urn an immer weiterentfernten Orten nach 
45 der zweiten Mehrzahl von Dateien zu suchen. 

19. Ein Verfahren gemafi Anspruch 13, bei dem jede 
der Dateien zugeordnete Metadaten umfalit, und 
bei dem die Beschreiberdatei die zugeordneten 

50 Metadaten gemeinsam mit jedem der Dateiidentifi- 
zierer umfalit, wobei das Verfahren ferner folgen- 
den Schritt aufweist: 

Erhalten der zugeordneten Metadaten fur jede 
55 der Dateien. 

20. Ein System zum zuverlassigen Ubertragen eines di- 
gitalen Computer-Objektes, mit folgenden Merkma- 
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len: 

einem digitalen Computer-Objekt; 

einer Beschreiberdatei, die einen Digitales-Ob- 5 
jekt-ldentifizierer fur das digitale Objekt urn- 
faftt, wobei der Digitales-Objekt-ldentifizierer 
das Ergebnis einer kryptographtschen Hash- 
Funktion basierend auf einem Abschnitt des in- 
halts des digitalen Computer-Objektes ist; 10 

einem Beschreiberdatei-ldentifizierer, der das 
Ergebnis einer kryptographischen Hash-Funk- 
tion basierend auf einem Abschnitt des Inhalts 
der Beschreiberdatei ist; und 15 

einem Importierprogramm, das angeordnet ist, 
urn den Dateiidentifizierer von der Beschreiber- 
datei anzunehmen, und urn die Computerdatei 
unter Verwendung des Dateiidentifizierers wie- 20 
derzugewinnen, wodurch die Computerdatei 
zuverlassig a!s die Datei, die iibertragen wer- 
den soil, identifiziert werden kann. 

21. Ein System gemafc Anspruch 20, bei dem der 25 
MD5-Algorithmus die kryptographische Hash- 
Funktion ist. 

22. Ein System gemafc Anspruch 20, das ferner folgen- 
des Merkmal aufweist: 30 

eine Mehrzahl von Computerdateien, wobei die 
Beschreiberdatei eine Dateiverzeichnisstruktur 
umfafct, die angeordnet ist, urn die Computer- 
dateien zu organisieren, wodurch der Beschrei- 35 
berdatei-ldentifizierer ferner die Dateiverzeich- 
nisstruktur der Computerdateien identifiziert. 

23. Ein System gemafi Anspruch 20, das ferner folgen- 
des Merkmal aufweist: 40 

eine Serie von Importierprogrammen, die an- 
geordnet sind, urn an immer weiter entfernten 
Orten nach der Computerdatei zu suchen. 

45 

24. Ein System gemafi Anspruch 20, bei dem die Be- 
schreiberdatei ferner Metadaten, die der Datei zu- 
geordnet sind, umfa&t. 



Revendications 

1. Procede pour identifier une pluralite d'informations 
numeriques en vue de leur recuperation ulterieure, 
ledit procede comprenant les 6tapes consistant a : 55 

selectionner (102) une pluralite d'informations 
numeriques ; 



generer (104) un identificateur d'informations 
parasites cryptographique pour chacune des- 
dites informations numeriques ; 
generer (106) un dossier descripteur qui com- 
porte lesdits identificateurs d'informations pa- 
rasites cryptographiques; generer (108) un 
identificateur de liste d'informations parasites 
cryptographique pour ledit dossier descripteur ; 
et memoriser lesdites informations numeri- 
ques, ledit dossier descripteur et ledit identifi- 
cateur de liste d'informations parasites crypto- 
graphique a des emplacements surs, ledit iden- 
tificateur de liste d'informations parasites cryp- 
tographique pouvant etre utilise a un moment 
ulterieur pour recuperer lesdites informations. 

2. Procede selon la revendication 1, comprenant, par 
ailleurs, I'etape consistant a : 

traduire (110) ledit identificateur de liste d'infor- 
mations parasites cryptographique en une for- 
me alpha-numerique pouvant etre lue par 
I'homme, ledit identificateur de liste d'informa- 
tions parasites cryptographique traduit pouvant 
aisement etre transfere et memorise. 

3. Procede selon la revendication 1 , comprenant, par 
ailleurs, les etapes consistant a : 

identifier une sructure de directoire de dossiers 
qui est amenagee en vue d'organiser lesdits 
dossiers ; et 

incorporer ladite sructure de directoire de dos- 
siers et son rapport avec lesdits dossiers dans 
ledit dossier descripteur, ledit identificateur de 
liste d'informations parasites cryptographique 
genere identifiant, par ailleurs, ladite sructure 
de directoire de dossiers desdits dossiers. 

4. Procede selon la revendication 1 , comprenant, par 
ailleurs, les etapes consistant a : 

recuperer ledit identificateur de liste d'informa- 
tions parasites cryptographique ; et 
signaler une demande dudit dossier descrip- 
teur identifiee par ledit identificateur de liste 
d'informations parasites cryptographique. 

5. Procede selon la revendication 1 , dans lequel cha- 
que information numerique presente, y associees, 
des meta-donnees, et ledit element de creation 
comporte I'etape consistant a : 

creer un dossier descripteur qui comporte ledits 
identificateurs d'informations parasites crypto- 
graphiques et lesdites meta-donnees asso- 
ciees auxdits dossiers, ledit identificateur de 
liste d'informations parasites cryptographique 
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pouvant etre utilise a un moment ulterieur, pour 
recuperer lesdits dossiers et leurs meta-don- 
nees associees. 

6. Procede selon la revendication 1 , dans lequel ladite 
fonction parasite cryptographique est I'algorithme 
MD5. 

7. Procede selon la revendication 1 , comprenant, par 
ailleurs, les etapes consistant a : 

associer ledit identificateur de liste d'informa- 
tions numeriques genere a un localisateur de 
dossier qui indique un emplacement potentiel 
de ladite liste d'informations numeriques ; et 
transferer ledit identificateur de liste d'informa- 
tions numeriques genere par ledit localisateur 
de dossier a un emplacement sur ou a un utili- 
sateur recepteur. 

8. Procede selon la revendication 1 , comprenant, par 
ailleurs, les etapes consistant a : 

determiner la recuperation ou non desdites in- 
formations numeriques ; 
recuperer ledit identificateur de liste d'informa- 
tions numeriques lorsqu'il est determine de re- 
cuperer lesdites informations numeriques ; et 
diffuser une demande de ladite liste d'informa- 
tions numeriques identifee par ledit identifica- 
teur de liste d'informations numeriques. 

9. Procede selon la revendication 7, comprenant, par 
ailleurs, les etapes consistant a : 

recevoir ladite liste d'informations numeriques 
comportant lesdits identificateurs d'informa- 
tions numeriques ; et 

verifier si ladite liste d'informations numeriques 
est correcte en regenerant ledit identificateur 
de liste d'informations numeriques sur base de 
ladite liste d'informations numeriques recue. 

10. Procede selon la revendication 9, comprenant, par 
ailleurs, les etapes consistant a : 

soumettre une demande desdites informations 
numeriques identifies par lesdits identifica- 
teurs d'informations numeriques dans ladite lis- 
te d'informations numeriques ; 
recevoir lesdites informations numeriques 
demandees ; et 

verifier si lesdites informations numeriques 
sont correctes en regenerant un identificateur 
d'informations numeriques pour chacune des- 
dites informations numeriques. 

11. Procede selon la revendication 1, dans lequel 



I'identificateur d'informations parasites cryptogra- 
phiques est calcule a partir d'une partie du contenu 
de ladite information numerique. 

5 12. Procede selon la revendication 1 , dans lequel ladite 
information numerique presente des meta-donnees 
associees, ledit procede comprenant, par ailleurs, 
les etapes consistant a : 

10 mem6riser dans ledit dossier descripteur lesdi- 

tes meta-donnees associees a ladite informa- 
tion numerique ; et 

obtenir ledit dossier descripteur comportant 
lesdites meta-donnees, lesdites meta-donnees 
15 associees etant obtenues ensemble avec ladi- 

te copie de ladite information numerique. 

13. Procede selon la revendication 1, pour recuperer 
une pluralite d'informations numeriques voulues 

20 d'un emplacement, comprenant les etapes consis- 
tant a : 

recevoir un identificateur de liste d'informations 
numeriques qui identifie uniquement un dossier 
25 descripteur correspondant auxdites informa- 

tions numeriques ; 

recuperer ledit dossier descripteur a I'aide dudit 
identificateur de liste d'informations numeri- 
ques, ledit dossier descripteur comportant un 

30 identificateur d'informations numeriques pour 

chacune desdites informations numeriques 
voulues, chacun desdits identificateurs d'infor- 
mations numeriques identifiant uniquement 
Tune desdites informations numeriques 

35 voulues ; 

recuperer une seconde pluralite d'informations 
numeriques a I'aide desdits identificateurs d'in- 
formations numeriques ; et 
verifier que lesdites informations numeriques 

40 recuperees correspondent auxdites informa- 

tions numeriques voulues, a I'aide desdits iden- 
tificateurs d'informations numeriques. 

14. Procede selon la revendication 13, dans lequel la- 
45 dite verification comporte les etapes consistant a : 

calculer un nouvel identificateur d'informations 
numeriques pour chacune desdites informa- 
tions numeriques recuperees, a I'aide de la me- 

50 me fonction parasite cryptographique que celle 

utilisee pour calculer ledit identificateur d'infor- 
mations numeriques pour chacune desdites in- 
formations numeriques voulues ; et 
comparer lesdits nouveaux identificateurs d'in- 

55 formations numeriques desdites informations 

numeriques recuperees auxdits identificateurs 
d'informations numeriques dudit dossier des- 
cripteur, un nouvel identificateur d'informations 
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numeriques correspondant a un identificateur 
d'informations numeriques pour Tune donnee 
desdites informations numeriques indiquant 
que ladite information numerique a ete verifiee. 

5 

1 5. Procede selon la revendication 13, comprenant, par 
ailleurs, ies etapes consistant a: 

lire, du dossier descripteur, une structure de di- 
rectoire de dossiers amenag^e en vue d'orga- 10 
niser lesdits dossiers voulus ; et 
placer lesdits dossiers recuperes selon une 
hierarchie de dossiers similaire a ladite struc- 
ture de directoire de dossiers. 

15 

16. Procede selon la revendication 13, comprenant, par 
ailleurs, Ies etapes consistant a: 

signaler une demande desdits dossiers voulus 
par un reseau d'ordinateur, identifiee par lesdits 20 
identificateurs de dossier; et 
recevoir ladite seconde pluralite de dossiers 
d'emplacements sur ledit reseau d'ordinateur. 

17. Procede selon la revendication 1 3, comprenant, par 25 
ailleurs, I'etape consistant a : 

verifier si ledit dossier descripteur est correct, 
en regenerant ledit identificateur de liste de 
dossiers sur base dudit dossier descripteur re- 30 
cupere. 

18. Procede selon la revendication 13, comprenant, par 
ailleurs, I'etape consistant a : 

35 

recuperer ladite seconde pluralite de dossiers 
a I'aide d'une pluralite de programmes impor- 
tateurs amenages en vue de rechercher ladite 
seconde pluralite de dossiers a des emplace- 
ments progressivement plus eloignes. 40 

19. Procede selon la revendication 13, dans lequel cha- 
cun desdits dossiers comporte des meta-donnees 
associees et dans lequel ledit dossier descripteur 
comporte tesdites meta-donnees associees en- 45 
semble avec chacun desdits identificateurs de dos- 
sier, ledit procede comprenant, par ailleurs, I'etape 
consistant a : 

obtenir lesdites meta-donnees associees pour 50 
chacun desdits dossiers. 

20. Systeme pour transferer de maniere fiable une don- 
nee numerique d'ordinateur, comprenant : 

55 

une donnee numerique d'ordinateur ; 
un dossier descripteur qui comporte un identi- 
ficateur de donnees numeriques pour ladite 



donnee numerique, ledit identificateur de don- 
nees numeriques etant le resultat d'une fonc- 
tion parasite cryptographique basee sur une 
partie du contenu de ladite donnee numerique 
d'ordinateur ; 

un identificateur de dossier descripteur qui est 
le resultat d'une fonction parasite cryptographi- 
que basee sur une partie du contenu dudit dos- 
sier descripteur ; 

un programme importateur amenage en vue 
d'accepter ledit identificateur de dossier dudit 
dossier descripteur et de recuperer ledit dos- 
sier d'ordinateur a I'aide dudit identificateur de 
dossier, ledit dossier d'ordinateur pouvant etre 
identifie de maniere fiable comme etant le dos- 
sier a transferer. 

21. Systeme selon la revendication 20, dans lequel la- 
dite fonction parasite cryptographique est I'algorith- 
me MD5. 

22. Systeme selon la revendication 20, comprenant, 
par ailleurs : 

une pluralite de dossiers d'ordinateur, et dans 
lequel ledit dossier descripteur comporte une 
structure de directoire de dossiers qui est ame- 
nagee en vue d'organiser lesdits dossiers d'or- 
dinateur, ledit identificateur de dossier descrip- 
teur identifiant, par ailleurs, ladite structure de 
directoire de dossiers desdits dossiers d'ordi- 
nateur. 

23. Systeme selon la revendication 20, comprenant, 
par ailleurs : 

une serie de programmes importateurs amena- 
ges en vue de rechercher ledit dossier d'ordi- 
nateur a des emplacements progressivement 
plus eloignes. 

24. Systeme selon la revendication 20, dans lequel ledit 
dossier descripteur comporte, par ailleurs : des me- 
ta-donnees associees audit dossier. 
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